智能论文笔记

A Data Quarantine Model to Secure Data in Edge Computing

Poornima Mahadevappa , Raja Kumar Murugesan

分类：神经与进化计算

2021-11-15

边缘计算通过分散的云和地理上分布边缘节点提供延迟敏感和通信密集型应用程序的敏捷数据处理平台。由于安全问题和威胁，对边缘节点的集中控制可能是挑战。在若干安全问题中，数据完整性攻击可能导致数据和侵入边缘数据分析不一致。进一步加剧攻击使得减轻和识别根本原因挑战。因此，本文提出了数据检疫模型的新概念，以通过隔离入侵者缓解数据完整性攻击。使用隔离区的云，ad-hoc网络和计算机系统中的有效安全解决方案具有在边缘计算中采用它的动力。数据采集边缘节点通过维度降低识别入侵者并检定所有可疑设备。在隔离期间，拟议的概念构建了信誉分数，以确定错误识别的合法设备，并消毒其受影响的数据以重新获得数据完整性。作为初步调查，这项工作识别适当的机器学习方法，线性判别分析（LDA），用于减少维度。 LDA导致72.83％的检疫精度和0.9秒的培训时间，比其他最先进的方法有效。将来，这将由地面真理数据实施和验证。

translated by 谷歌翻译

A Comparative Analysis of Machine Learning Algorithms for Intrusion Detection in Edge-Enabled IoT Networks

Poornima Mahadevappa , Syeda Mariam Muzammal , Raja Kumar Murugesan

分类：机器学习

2021-11-02

通过无线网络互联设备数量和数据通信数量的显着增加引起了各种威胁，风险和安全问题。物联网（IoT）应用程序几乎部署在日常生活中的几乎所有领域，包括敏感环境。边缘计算范例通过在数据源附近移动计算处理来补充了IOT应用程序。在各种安全模型中，基于机器学习（ML）的入侵检测是最可想到的防御机制，用于打击已启用边缘的物联网中的异常行为。 ML算法用于将网络流量分类为正常和恶意攻击。入侵检测是网络安全领域的具有挑战性问题之一。研究界提出了许多入侵检测系统。然而，选择合适的算法涉及在启用边缘的物联网网络中提供安全性的挑战存在。在本文中，已经执行了传统机器学习分类算法的比较分析，以在Puparm工具上使用Jupyter对NSL-KDD数据集上的网络流量进行分类。可以观察到，多层感知（MLP）在输入和输出之间具有依赖性，并且更多地依赖于用于入侵检测的网络配置。因此，MLP可以更适合于基于边缘的物联网网络，其具有更好的培训时间为1.2秒，测试精度为79％。

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning

Sreejan Kumar , Ishita Dasgupta , Raja Marjieh , Nathaniel D. Daw , Jonathan D. Cohen , Thomas L. Griffiths

分类：人工智能

2022-04-04

获得抽象知识的能力是人类智力的标志，许多人认为是人类和神经网络模型之间的核心差异之一。代理可以通过元学习对抽象的归纳偏见，在那里他们接受了共享可以学习和应用的一些抽象结构的任务分布的培训。但是，由于很难解释神经网络，因此很难判断代理人是学会了潜在的抽象，或者是该抽象特征的统计模式。在这项工作中，我们比较了人类和代理在荟萃方面学习范式中的表现，其中从抽象规则中产生了任务。我们定义了一种用于构建“任务Metamers”的新方法，该方法与抽象任务的统计数据非常匹配，但使用了不同的基本生成过程，并评估了在抽象和Metamer任务上的性能。在我们的第一组实验中，我们发现人类在抽象任务上的表现要比MetAmer任务更好，而广泛使用的元强化学习代理在抽象任务上的表现要比匹配的Metamers差。在第二组实验中，我们将任务基于直接从经验鉴定的人类先验得出的抽象基础。我们利用相同的过程来生成相应的METAMER任务，并看到人与代理之间的相同双重分离。这项工作为表征人类和机器学习之间的差异奠定了基础，可以在未来的工作中用于以人类行为开发机器。

translated by 谷歌翻译

Developing Successful Shared Tasks on Offensive Language Identification for Dravidian Languages

Bharathi Raja Chakravarthi , Dhivya Chinnappa , Ruba Priyadharshini , Anand Kumar Madasamy , Sangeetha Sivanesan , Subalalitha Chinnaudayar Navaneethakrishnan , Sajeetha Thavareesan , Dhanalakshmi Vadivel , Rahul Ponnusamy , Prasanna Kumar Kumaresan

分类：自然语言处理

2021-11-05

随着移动计算和网络技术的快速增长，令人反感的语言在社交网络平台上变得更加普遍。由于本地语言的令人反感语言识别对于中等社交媒体内容至关重要，因此在本文中，我们使用三种Dravidian语言，即Malayalam，Tamil和Kannada，这些语言遭到资源。我们在EACL 2021的Fire 2020- Hasoc-DravidiancodeMix和Dravidianlangtech提供了一个评估任务，旨在提供一个比较不同方法对此问题的框架。本文介绍了数据创建，定义任务，列出参与系统，并讨论各种方法。

translated by 谷歌翻译

Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Bharathi Raja Chakravarthi , Ruba Priyadharshini , Rahul Ponnusamy , Prasanna Kumar Kumaresan , Kayalvizhi Sampath , Durairaj Thenmozhi , Sathiyaraj Thangasamy , Rajendran Nallathambi , John Phillip McCrae

分类：自然语言处理

2021-09-01

社交媒体平台上的滥用内容的增长增加对在线用户的负面影响。对女同性恋，同性恋者，跨性别或双性恋者的恐惧，不喜欢，不适或不疑虑被定义为同性恋/转铁症。同性恋/翻译语音是一种令人反感的语言，可以总结为针对LGBT +人的仇恨语音，近年来越来越受到兴趣。在线同性恋恐惧症/ Transphobobia是一个严重的社会问题，可以使网上平台与LGBT +人有毒和不受欢迎，同时还试图消除平等，多样性和包容性。我们为在线同性恋和转鸟以及专家标记的数据集提供了新的分类分类，这将允许自动识别出具有同种异体/传递内容的数据集。我们受过教育的注释器并以综合的注释规则向他们提供，因为这是一个敏感的问题，我们以前发现未受训练的众包注释者因文化和其他偏见而诊断倡导性的群体。数据集包含15,141个注释的多语言评论。本文介绍了构建数据集，数据的定性分析和注册间协议的过程。此外，我们为数据集创建基线模型。据我们所知，我们的数据集是第一个已创建的数据集。警告：本文含有明确的同性恋，转基因症，刻板印象的明确陈述，这可能对某些读者令人痛苦。

translated by 谷歌翻译

e-Inu: Simulating A Quadruped Robot With Emotional Sentience

Abhiruph Chakravarty , Jatin Karthik Tripathy , Sibi Chakkaravarthy S , Aswani Kumar Cherukuri , S. Anitha , Firuz Kamalov , Annapurna Jonnalagadda

分类：机器人 | 机器学习

2023-01-03

Quadruped robots are currently used in industrial robotics as mechanical aid to automate several routine tasks. However, presently, the usage of such a robot in a domestic setting is still very much a part of the research. This paper discusses the understanding and virtual simulation of such a robot capable of detecting and understanding human emotions, generating its gait, and responding via sounds and expression on a screen. To this end, we use a combination of reinforcement learning and software engineering concepts to simulate a quadruped robot that can understand emotions, navigate through various terrains and detect sound sources, and respond to emotions using audio-visual feedback. This paper aims to establish the framework of simulating a quadruped robot that is emotionally intelligent and can primarily respond to audio-visual stimuli using motor or audio response. The emotion detection from the speech was not as performant as ERANNs or Zeta Policy learning, still managing an accuracy of 63.5%. The video emotion detection system produced results that are almost at par with the state of the art, with an accuracy of 99.66%. Due to its "on-policy" learning process, the PPO algorithm was extremely rapid to learn, allowing the simulated dog to demonstrate a remarkably seamless gait across the different cadences and variations. This enabled the quadruped robot to respond to generated stimuli, allowing us to conclude that it functions as predicted and satisfies the aim of this work.

translated by 谷歌翻译

NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Santhosh Kumar Ramakrishnan , Ziad Al-Halah , Kristen Grauman

分类：计算机视觉

2023-01-02

Searching long egocentric videos with natural language queries (NLQ) has compelling applications in augmented reality and robotics, where a fluid index into everything that a person (agent) has seen before could augment human memory and surface relevant information on demand. However, the structured nature of the learning problem (free-form text query inputs, localized video temporal window outputs) and its needle-in-a-haystack nature makes it both technically challenging and expensive to supervise. We introduce Narrations-as-Queries (NaQ), a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. Validating our idea on the Ego4D benchmark, we find it has tremendous impact in practice. NaQ improves multiple top models by substantial margins (even doubling their accuracy), and yields the very best results to date on the Ego4D NLQ challenge, soundly outperforming all challenge winners in the CVPR and ECCV 2022 competitions and topping the current public leaderboard. Beyond achieving the state-of-the-art for NLQ, we also demonstrate unique properties of our approach such as gains on long-tail object queries, and the ability to perform zero-shot and few-shot NLQ.

translated by 谷歌翻译

Statistical Machine Translation for Indic Languages

Sudhansu Bala Das , Divyajoti Panda , Tapas Kumar Mishra , Bidyut Kr. Patra

分类：自然语言处理

2023-01-02

Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES

translated by 谷歌翻译